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attachment to the Notification of Missing Requirements dated July 16, 2001 (copy enclosed). 

In this particular case, it is not appropriate to translate the Japanese characters on the 
drawings into English. An important aspect of the invention is the recognition of the shape of 
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Phoenix, Arizona 85004-0001 
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DESCRIPTION 

RETRIEVAL METHOD , RETRIEVAL DEVICE, AND RECORDING MEDIUM 

TECHNICAL FIELD 

The present invention relates to a retrieval method, 
a retrieval device, and a recording medium for retrieving 
a character string, which matches a search keyword, from 
document data which is obtained by subjecting an original 
document to character recognition. 

BACKGROUND ART 

Japanese Laid-open Publication No. 7-152774, 
entitled "DOCUMENT SEARCHING METHOD AND DEVICE", discloses 
a conventional technique which is known as a technique for 
searching document data, which is obtained by subjecting 
a document to character recognition, for data relevant to 
a designated character string, 

Figure 23 shows a relationship hetween an original 
document and a result of character recognition in the 
original document* A result of character recognition is 
also referred to as a recognition result. In general , 
character recognition is adversely affected by the 
faintness, angle, style, size, and the like of characters 
printed on paper. 

Figure 23 shows an example in which a character 
in the original document is incorrectly recognized as 
another character "^C*. Further, a character "P" in the 
original document is incorrectly recognized as another 
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character B |2\ 

Hereinafter, a process of searching the recognition 
result (Figure 23} for a character string "B^* will be 
described based on the technique described in the above 
Japanese Laid-open Publication No, 7-152774. 

This retrieval process usee a table (Table 1) 
indicating misrecognized characters * The table indicating 
misrecognized characters is a table which lists certain 
characters which tend to be incorrectly recognized by- 
character recognition. Table 1 shows that a character 
tends to be incorrectly recognized as n ^ m , "^fc", or 

* ^ " and that a character " □ ■ tends to be incorrectly 
recognized as * □ ( symbolic quadrangle " 13 " , " P3 " , or 



Table 1 



subject Character 


Misrecognized aharaaters 




3r 


P 





When searching the recognition result o£ Figure 23 
for a character string "0^*, character strings ^B'fc", *B 
^C' r m B&" , and n 0^" are produced based on the character 
string * H $ ' using the table (Table 1) indicating 
misrecognized characters. In addition to the designated 
character string 'B^*, the character strings *0 
3z m , "H^", and m B#* are searched for. Therefore, "B*" 
for which m H&" has been incorrectly recognized can be 
retrieved. 

However, in the retrieval process described in the 
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above Japanese Laid-open Publication No, 7-152774, a list 
of characters which tend to be incorrectly recognized is 
prepared in advance, Therefore, when searching data having 
few errors, an excessive amount of searching may be executed 
using excessive character candidates. Conversely, when 
searching document data having many errors, misrecognized 
characters other than those on the list may not be retrieved. 

For example, in the example shown in Figure 23, when 
the recognition result is searched for a character string 
'AD", the character strings "AG (symbolic quadrangle) " , 
"A®", "AW", and "A-** are produced using a table (Table 1} 
indicating misrecognized characters . Each of the character 
strings are searched for . However, when an error (e.g., " D " 
is incorrectly recognized as n E" which is not listed in the 
table {Table 1) indicating misrecognized characters) 
occurs, it is not possible to retrieve "AlS". 

Further, when searching document data, which has 
been obtained by recognizing characters in a general 
document having a certain layout, for a character string, 
the layout might be incorrectly recognized (e,g,, vertical 
writing is incorrectly recognized as horizontal writing or 
vice versa; a subsequent line to be concatenated after line 
feed is incorrectly recognized; the concatenation between 
each paragraph is incorrectly recognized; and the like). 
The recognition error of layouts cannot be addressed by the 
retrieval method described in the above Japanese Laid-open 
Publication No. 7-152774. 

For example, a case where an original document 
having a layout shown in Figure 24 is subjected to character 
recognition will be now discussed* In Figure 24 , the proper 
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order of the paragraphs Is an upper right paragraph, an upper 
left paragraph, a lower right paragraph, ana a lower left 
paragraph* However, in the process of character 
recognition, the order of the paragraphs may he incorrectly 
recognized, so that the lower right paragraph is incorrectly 
concatenated with the upper right paragraph, for example. 

In this case, when the recognition result is 
searched for a character string "B^^Ad", it is possible 
to search for individual characters using a table indicating 
mlsrecognlzed characters, or the like. However, when the 
concatenation of paragraphs is incorrect, the recognition 
results in " * • * B^CD^SI^fSj shown in Figure 25, 

for example. Therefore, the character string , B$©AP J ' 
cannot be retrieved. 

The present invention is provided to resolve the 
above-described problems- The objectives of the present 
invention ares 

(1) to provide a retrieval method in which a search 
aan be performed while dynamically changing a tolerance 
level to recognition error depending on a recognition result , 
and a retrieval device and a recording, medium; and 

(2) to provide a retrieval method in which a 
character string can b& correctly retrieved from a 
recognition result even when the layout of a document is 
incorrectly recognized, and a retrieval device and a 
recording medium. 

DISCLOSURE OP THE INVENTION 
A retrieval method of the present invention is 
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provided for searching a first character element string 
obtained by subjecting a character string to character 
recognition for a second character element string. The 
first character element string includes a first character 
5 element and the second character element string includes 
a second character element. A distance relevant to a 
similarity between the first character element and the 
second character element Is predetermined between the first 
character element and the second character element. The 
10 retrieval method comprises the steps of comparing the 
distance with a first predetermined reference distance, and 
p determining whether the second character element matches 

*8 the first character element based on a result of the 

comparison of the distance with the first predetermined 
15 reference distance. Therefore, the above-described 
objective is achieved. 



.1=7 



•35 S 



For the first character element, a reliability of 
character recognition may be predetermined, and the first 
20 predetermined reference distance may be determined based 
on the reliability. 

The predetermined first reference distance may be 
determined based on user input* 

25 

The retrieval method may further comprises the steps 
of changing the first predetermined reference distance to 
a second reference distance, comparing the distance with 
the second reference distance, and determining whether the 
30 second character element matches the first character 
element based on a result of the comparison of the distance 
with the second reference distance. 
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A plurality of distances relevant to the similarity 
between the first character element and the second character 
element may be predetermined between the f irst~ character 
element and the second character element , and one distance 
selected from the plurality of distances may be used as the 
distance. 

The one of the plurality of distances may be 
determined based on user input. 

The distance may have a probabilistic distribution. 

Another retrieval method of the present invention 
is provided for searching a first character element string 
obtained by subjecting a character string to character 
recognition for a second character element string. The 
first character element string includes a plurality of 
character elements . For a specific character element of the 
plurality of character elements, a plurality of character 
elements having the possibility of being concatenated with 
the specific character element are predetermined. The 
retrieval method comprises the steps of determining whether 
a character element string obtained by concatenating the 
specific character element of the plurality of character 
elements with one character element of the plurality of 
character elements, the one character element being 
different from the specific character element, matches at 
least a part of the second character element string. 
Therefore, the above -described objective is achieved. 

The retrieval method may comprise the steps of 
selecting one character element from the plurality of 
character elements having the possibility of being 
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concatenated with the specific character element, and 
determining whether a character element string obtained by- 
concatenating the specific character element" with the 
selected character element matches at least a part of the 
second character element string. 

The specific character element may be located at an 
end of a row or column, the plurality of character elements 
having the possibility of being concatenated with the 
specific character element are each located at a head of 
a row or column * 

The specific character element and one of the 
plurality of character elements having the possibility of 
being concatenated with the specific character element may 
be located at the same row or column* The specific character 
element and another one of the plurality of character 
elements having the possibility of being concatenated with 
the specific character element may be located at different 
rows or columns and at the same column or row* 

Another retrieval method of the present invention 
is provided for searching a first character element string 
obtained by subjecting a character string to character 
recognition for a second character element string. The 
first character element string includes at least one first 
character element and the second character element string 
includes at least one second character element. The 
retrieval method comprises the steps of obtaining a 
probability that a search result matches the second 
Character element string, based on the number of the second 
character elements included in the second character element 
string , and a number of the second character elements 
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matching the corresponding first character elements out of 
the second character elements Included in the second 
character element string, and determining the correctness 
of the search result based on the probability- Therefore, 
5 the above-described objective is achieved, 

A distance relevant to a similarity between the 
first character element and the second character element 
may be predetermined between the second character element 

10 and the corresponding first character element. The 
retrieval method may further comprises the steps of 
comparing the distance with a predetermined reference 
distance, and determining whether the second character 
element matches the corresponding first character element 

15 based on a result of the comparison of the distance with 
the predetermined reference distance. 

The retrieval method may further comprise the step 
of for a second character element out of the at least one 

20 second character element Included in the second character 
element string, said second character element not matching 
a corresponding first character element included in the 
first character element string, after resetting a 
predetermined reference distance, determining whether said 

25 second character element matches the corresponding first 
character element using the reset predetermined reference 
distance . 

The retrieval method may further comprise the step 
30 of dividing the second character element string into a 
plurality of character element portions. 



A retrieval device of the present invention is 
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provided for searching a first character element string 
obtained by subjecting a character string to character 
recognition for a second character element string* The 
first character element string includes a first character 
5 element and the second character element string includes 
a second character element. A distance relevant to a 
similarity between the first character element and the 
second character element is predetermined between the first 
character element and the second character element* The 
10 retrieval device comprises means for comparing the distance 
with a predetermined reference distance, and means for 
determining whether the second character element matches 
?R the first character element based on a result of the 

0! comparison of the distance with the predetermined reference 

15 distance* Therefore, the above-described objective is 
m achieved* 

F? 

JL Another retrieval device of the present invention 

*fj[ is provided for searching a first character element string 

H 20 obtained by subjecting a character string to character 

4* recognition for a second character element string. The 

7: first character element string includes a plurality of 

character elements . For a specific character element of the 
plurality of character elements, a plurality of character 
25 elements having the possibility of being concatenated with 
the specific character element are predetermined. The 
retrieval device comprises means for determining whether 
a character element string obtained by concatenating the 
specific character element of the plurality of character 
30 elements with one character element of the plurality of 
character elements, the one character element being 
different from the specific character element , matches at 
least a part of the second character element string. 
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Therefore, the above * described objective ±s achieved. 

Another retrieval device of the present^ Invention 
is provided for searching a first character element string 
obtained by subjecting a character string to character 
recognition for a second character element string. The 
first character element string includes at least one first 
character element and the second character element string 
includes at least one second character element. The 
retrieval device comprises means for obtaining a 
probability that a search result matches the second 
character element string, based on the number of the second 
character elements included in the second character element 
string, and a number of the second character elements 
matching the corresponding first character elements out of 
the second character elements included in the second 
character element string, and means for determining the 
correctness of the search result based on the probability. 
Therefore, the above- described objective is achieved. 

A computer readable recording medium of the present 
invention Is provided in which a program for causing a 
computer to execute a retrieval process for searching a first 
character element string obtained by subjecting a character 
string to character recognition for a second character 
element string is recorded. The first character element 
string includes a first character element and the second 
character element string includes a second character 
element. A distance relevant to a similarity between the 
first character element and the second character element 
is predetermined between the first character element and 
the second character element. The retrieval process 
comprises the steps of comparing the distance with a 
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predetermined reference distance, and determining whether 
the second character element matches the first character 
element based on a result of the comparison of the distance 
with the predetermined reference distance. Therefore, the 
5 above-described objective is achieved. 



Another computer readable recording medium of the 
present invention is provided in which a program for causing 
a computer to execute a retrieval process for searching a 
10 first character element string obtained by subjecting a 
character string to character recognition for a second 
character element string is recorded* The first character 
03 element string includes a plurality of character elements . 

S* For a specific character element of the plurality of 

m 

15 character elements, a plurality of character elements 
Qi having the possibility of being concatenated with the 

4? specific character element are predetermined. The 

;L retrieval process comprises the steps of determining 

whether a character element string obtained by 
H 20 concatenating the specific character element of the 

plurality of character elements with one character element 
of the plurality of character elements, the one character 
element being different from the specific character element , 
matches at least a part of the second , character element 
25 string* Therefore , the above -described objective is 
achieved * 



pa 



Another computer readable recording medium of the 
present invention is provided in which a program for causing 
30 a computer to execute a retrieval process for searching a 
first character element string obtained by subjecting a 
character string to character recognition for a second 
character element string is recorded. The first character 
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element string includes at least one first character element 
and the second character element string Includes at least 
one second character element. The retrieval process 
comprising the steps of obtaining a probability that a search 
5 result matches the second character element string, based 
on the number of the second character elements included in 
the second character element string, and a number of the 
second character elements matching the corresponding first 
character elements out of the second character elements 
10 included in the second character element string, and 
determining the correctness of the search result based on 
the probability. Theref ore, the above-described objective 
is achieved. 

15 BRIEF DKSCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram showing a retrieval 
device 1 shared by Examples 1 through 14 of the present 
invention* 

20 

Figure 2 is a diagram showing an exemplary table 
Indicating distances between character elements* 

Figure 3A is a diagram showing an exemplary 
25 character element consisting of one character radical. 

Figure 3B is a diagram showing an exemplary 
character element consisting of one character and one 
character radical. 



30 



Figure 4 is a diagram showing a relationship between 
an original document and a recognition result obtained by 
subjecting the original document to character recognition. 



- 13 - 



P21873 



Figure 5 is a diagram showing an example in which 
the reliability of character recognition with respect to 
individual character elements included in document data 
5 (recognition result) is predetermined as a search 
parameter. 

Figure 6 is a diagram showing a distance 
relationship between each of the character elements 'A", 
10 m P " , n M m i and and some of the other character elements . 

Figure 7 is a diagram showing an exemplary distance 
table prepared for a font type "Mincho font"* 

15 Figure 8A is a diagram showing an example of a 

recognition error in which a plurality of character elements 
are recognized as a single character element. 

Figure 6B is a diagram showing an example of a 
20 recognition error in which a single character element is 
recognized as a plurality of character elements. 

Figure 9 is a diagram showing an exemplary character 
element distance table including the character elements 
25 shown in Figures 8A and 8B. 

Figure 10A is a diagram showing an exemplary 
distance table including the frequency of occurrence of a 
character element. 



30 



Figure 10B is a diagram showing another exemplary 
distance table including the frequency of occurrence of a 
character element. 
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Figure IOC is a diagram showing another exemplary 
distance table including the frequency of occurrence of a 
character element. 

Figure 11 is a diagram showing an exemplary 
relationship among the reliability of character recognition, 
a reference distance, and the frequency of occurrence » 

Figure 12 is a diagram showing an exemplary 
relationship among the reliability of character recognition/ 
a reference distance r and the frequency of occurrence. 

Figure 13 is a diagram showing the steps of a process 
for producing a distance table having a frequency of 
occurrence . 

Figure 14 ie a flowchart showing the steps of a 
retrieval process of searching a recognition result for a 
character element string matching a search keyword using 
a distance table having a frequency of occurrence. 

Figure 15 is a diagram showing an example in which 
candidates of the recognition result, are additionally 
provided* 

Figure 16 is a diagram showing an exemplary original 
document including paragraphs A through P* 

Figure 17 shows an exemplary table indicating a 
result of recognizing the order of the paragraphs. 

Figure IB is a flowchart showing the steps of a 



- 15 - 



P21B73 



retrieval process of searching a recognition result for a 
character element string designated as a search keyword by 
talcing into account a concatenation relationship between 
paragraphs * 

Figure 19 is a diagram showing an exemplary table 
indicating a concatenation relationship between paragraphs 
in a recognition result. 

Figure 20 is a diagram showing an exemplary 
recognition result obtained by subjecting an original 
document to character recognition. 

Figure 21 is a diagram showing an exemplary original 
document having a layout in the form of horizontal writing. 

Figure 22k is a diagram showing a table indicating 
a recognition result of character recognition under an 
assumption that an original document is in the form of 
vertical writing* 

Figure 22B is a diagram showing a table indicating 
a recognition result of character recognition under an 
assumption that an original document is in the form of 
horizontal writing. 

Figure 23 is a diagram showing a relationship 
between an original document and a recognition result 
obtained by subjecting the original document to character 
recognition - 

Figure 24 is a diagram showing an exemplary original 
document having a layout- 
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Figure 25 is a diagram showing an original document 
and a recognition result obtained by subjecting the original 
document to character recognition. 

Figure 26 is a diagram used for explaining a 
retrieval process for searching a recognition result for 
a search word* 

10 Figure 27 is a diagram showing an exemplary 

probability table* 

Figure 28 is a diagram used for explaining a 
retrieval process for searching a recognition result for 
15 a search word* 

Figure 29 is a diagram used for explaining a 
retrieval process for searching a recognition result for 
a search word* 

20 

Figure 30 is a flowchart showing the steps of a fuzzy 
retrieval process. 

Figure 31 is a diagram used ,for explaining a 
25 retrieval process for searching a recognition result for 
a search word. 

BEST MODE FOR CARRYING OUT THE INVENTION 

30 Hereinafter, the present invention will be 

described by way of illustrative examples with reference 
to the accompanying drawings* 
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Figure 1 shows a configuration of a retrieval 
device l which is used in Examples 1 to 14 of the present 
Invention ♦ 

5 The retrieval device 1 includes: a terminal 100; a 

CPU 110 for executing a registration process and a retrieval 
process? ah image input device 120 £or inputting a document 
as Image data; a memory 130 for storing the image data; a 
memory 140 for storing document data (recognition result) 

10 obtained by subjecting the image data to character 
recognition; a memory 150 for storing a table indicating 
a distance between each character element (hereinafter also 
referred to as a character element distance table); a 
character recognition pattern dictionary 160? a memory 170 

15 for storing a document registering program, a character 
recognition program, and a document searching program? and 
a working memory 180. 

Each component of the retrieval device 1 may be 
20 connected with each other via an internal bus or a network * 

First, a flow of the registration process will be 
described below • 

25 A user provides an instruction to start the 

registration process through the terminal 100. Then, the 
document registering program in the memory 170 is started 
and loaded into the working memory 180. The CPU 110 
executes the document registering program. As a result, the 

30 image input device 120 reads the document as image data. 
The image data is then stored in the memory 130. The image 
input device 120 is , for example, a scanner, a digital camera, 
or a video camera • 



- 18 - 



P21873 



The document registering program starts the 
character recognition program in the memory 170. The 
character recognition program is then loaded into the 
working memory 180, The CPU 110 executes the character 
recognition program* As a result, the image data stored in 
the memory 130 is read out. The character information 
included in the image data is converted into a character 
code string, so that document data (recognition result) is 
obtained. The document data (recognition result) is stored 
in the memory 140, The conversion of the character 
information included in the image data into a character code 
string is performed with reference to the character 
recognition pattern dictionary 160. 

Any algorithm which performs character recognition 
may be used. In an exemplary algorithm of the character 
recognition program, image data may be extracted on a 
word-by-word basis and the extracted one-word image data 
may be converted into a character code. 

A flow of the retrieval process will be described 

below . 

A user enters a search keyword through the 
terminal 100 and provides an instruction to start the 
retrieval process. The document searching program in the 
memory 170 is started and loaded into the working memory 180 . 
The CPU 110 executes the document searching program. As a 
result, whether a character element string corresponding 
to the search keyword exists in the document data 
(recognition result) is determined using the distance table 
stored in the memory 150, A result of the search is 
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displayed on the terminal 100, The image data 
corresponding to the search result may be displayed on the 
terminal 100 * 

5 (Example 1) 

Figure 2 shows an exemplary character element 
distance table stored in the memory 150. The character 
element distance table provides numerical values 
representing a relationship {close or distant) between each 
10 character element. 

The term "character element" as herein used refers 
to one or more characters, one or more character radicals , 
or a combination of one or more characters and one or more 
15 character radicals. 

For example, W M" is a character element consisting 
of one character. * 0 0 * is a character element consisting 
of two characters. 

20 

The term 'character radical* refers to a part of a 
character. For example, a character radical corresponds to 
the left or right side of a Japanese Kan^i character- 
Figure 3A shows an exemplary character element consisting 
25 of one character radical. Figure 3B shows an exemplary 
character element consisting of one character and one 
character radical. 

It should be noted that characters include symbols 
30 such as and "©*• 



The distance table of Figure 2 provides 
predetermined distances between character elements. The 
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distance relates to a similarity between each character 
element* The greater the distance, the lesser the 
similarity. The lesser the distance, the greater the 
similarity. Therefore, the similarity between each 
5 character element is an inverse of the distance between each 
character element. 



1 15 



In the example of Figure 2, a distance between a 
character element "S" and a character element is set 

10 to 10. A distance between the character element *J£L" and 
a character element "00" is set to 172. This means that 
the character element m M m is more similar to the character 
element than to the character element "00"* Distances 
between other character elements are similarly 

15 predetermined. 

The distances can take arbitrary values which are 
relevant to the similarities between character elements. 
For example, the distances may take input-output 
20 relationships of a specific character recognition system, 
Euclid distances in a feature quantity space where the shapes 
of individual character elements are represented by 
numerical values of feature quantities, or the like. 

25 It should be noted that the 'distances between 

character elements may not be necessarily represented by 
a table in the form of a matrix as shown in Figure 2. The 
distances between character elements can be represented by 
any form as long as the distances are relevant to the 

30 similarities between character elements. For example, 
character elements may be listed in the distance table in 
such a manner that for each character element , the other 
character elements are arranged in ascending order of 
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distance. The order itself may be regarded as a distance 4 

Hereinaf ter, a description will be given of a 
retrieval method using a table indicating distances between 
5 character elements. 

Figure 4 shows a relationship between an original 
document and a recognition result obtained by subjecting 
the original document to character recognition. In the 

10 example of Figure 4, an original document containing the 
character string * * * • B2fc©An^EfcK • • • * is subjected to 
character recognition, resulting in " • ■ • B^COAE^fiS^* * * " 
being recognized (recognition result ). The recognition 
result is stored in the memory 140 as document data* The 

15 memory 140 may be a storage medium of any type. 

In general, character recognition is performed 
using character recognition technology, errors occur in the 
character recognition due to various factors, In the 
20 example of Figure 4, a character " 2fc * is incorrectly 
recognised as a character * ^ " and a character * P * is 
incorrectly recognized as a character "IS*. 

It Is now assumed that a character element string 
25 ■ 0^" is designated as a search keyword, wad the recognition 
result of Figure 4 is searched for a character element string 
corresponding to the search keyword. This retrieval 
process will be described below. The retrieval process is 
executed by the CPU 110 in accordance with the document 
30 searching program. 

First, the character element distance table is 
referenced with respect to the character element * B * of the 
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designated character element string tt 0^ ,r . Thereafter, 
the recognition result of Figure 4 is searched for character 
elements having a distance from "B" which is less than a 
predetermined reference distance (e.g., 150) (e.g., for the 
character element f B B , the character element * 0 * and a 
character element ■ U " are searched for) * In this case, the 
character element * 0 * is detected as a result of the search. 

Thereafter, the character element distance table is 
referenced with respect to the next character element 
of the designated character element string " 0 ^ " . 
Thereafter, it is determined whether any of the character 
elements having a distance from which is less than a 

predetermined reference distance (e.g., 150) (e.g., for the 
character element " ^ " # the character element ■ ^ * , a 
character element *>fc w , and a character element ~jz n are 
searched for) matches the character element positioned 
next to the detected character element " 0" . In this case, 
the character element m 7fc" matches the character element 
positioned next to the detected character element * B " * 
Therefore, for the designated character element string *0 
^h" , a character element string *UJs* can "be detected in 
the recognition result of Figure 4* 

* 

In this manner, even when the original character 
string *0^* has been incorrectly recognized as n 0^*/ the 
original location of the character string can he detected* 

In practical applications, it is preferable that 
When a designated character element string is detected, not 
only a detected character element string, but also the 
recognition results of sentences before and after the 
detected character element string, are presented, it is 
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also preferable that an original document before character 
recognition is separately stored as document image data and 
a corresponding document image is presented to a searcher. 
Therefore/ when a character recognition result is partially 
5 incorrect , a user also checks the original document so that 
the searcher can obtain required inf ormation . 

Alternatively, when a designated character element 
string is detected in a document , the title or abstract of 

10 the document may be displayed in addition to sentences before 
and after the detected character element. In this case, it 
is possible to know a search result even in a small display 
space. Further, if audio as well as a display are used to 
output sentences before and after a detected character 

15 element string , a title, and an abstract, the display area 
of a terminal can be reduced. Further, a search result may 
be output via a communication path (network) , When a search 
result is transferred via a communication path having a 
narrow band, the image of a document is not displayed from 

20 the beginning, but only a recognition result before and after 
a detected character element string, a title, and/or an 
abstract of a document are displayed at the beginning. An 
image having a large amount of information is optionally 
displayed by a searcher's instruction, thereby saving 

25 search time and/or reading time. 

Further, when a designated character element string 
Is detected, a new command may be issued to a device instead 
of presenting detected information. For example, an image 
30 obtained in real time from a camera or the like is searched. 
When a designated character element string (e.g., "l^h 
7> (restaurant)"} is detected, a command to store an image 
into a memory is issued to the device for taking Images. 
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Thereby, images of restaurants can be automatically 
collected. 

When a designated character element string is 
5 detected, a command to print out an image including the 
designated character element string may be issued to a 
printer, or image information including the designated 
character element string may be distributed via a 
communication network to a plurality of addresses. 

10 

It should be noted that a reference distance, which 
is compared with distances between character elements, is 
not limited to a value of 150. A reference distance may be 
set to an arbitrary value. A reference distance does not 
15 necessarily need to be fixed, but may be variable. A 
reference distance may be determined based on an input by 
a user* A reference distance may be determined based on a 
result of operation executed by the CPU 110* 

20 For example / a reference distance is set to a small 

value at the beginning. If a character element string 
corresponding to a search keyword cannot be detected in a 
recognition result , a reference distance may be reset to 
a sequentially increased value and a search is performed 

25 again. That is, a search is initially performed where the 
tolerance level to character recognition error is set to 
a low value, and thereafter the tolerance level to character 
recognition error is sequentially Increased. Theref ore, it 
is possible to prevent a character element string irrelevant 

30 to a search keyword from being detected due to a high 
tolerance level to character recognition error which is set 
at the beginning* 
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Further, the reliability (or likelihood, or 
accuracy, or probability, etc.,) of character recognition 
may be held as a search parameter along with document data 
(recognition result) obtained by subjecting an original 
5 document to character recognition. A reference value of 
distance for use in a search (reference distance) may be 
set to an appropriate value depending on the search parameter* 
The search parameter may be stored in the memory 140 , for 
example . 

10 

Figure 5 shows an example in which a reliability of 
character recognition for each character element included 
in document data (recognition result) is predetermined as 
a search parameter- In this case, the reliability is 
15 represented by a value in the range from 0 to 1, A larger 
value of the reliability means a higher probability of a 
recognition result. 

Hereinafter, description will be given of a 
20 retrieval process in which a character element string ■ A 
P^lfiS* is designated as a search keyword and a recognition 
result shown in Figure 5 is searched for a character element 
string corresponding to the search keyword • This retrieval 
process is executed by the CPU 110 in accordance with the 
25 document searching program* 

Figure 6 shows a distance relationship between each 
of the character elements *A', "P", m IS*, and *JS5c* and some 
other character elements. 

30 

As shown in Figure 5, the reliabilities of the 
character elements * A" and in a recognition result are 
predetermined to be 0.9. For a character element having 
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such a high predetermined reliability, a reference distance 
which is compared with distances between character elements 
is set to be a low value. For example, in the example of 
Figure 5, for a character element having a reliability of 
0,9, a reference distance is set to 10. For a character 
element in the recognition result, a reliability is 

predetermined to be 0.4. Thus, for a character element 
having such a low predetermined reliability, a reference 
distance which is compared with distances between character 
elements is set to a high value. For example , in the example 
of Figure 5, for a character element having a reliability 
of 0.4, the reference distance is set to 60* 

In this manner, reference distances are varied 
depending on reliabilities of character recognition, 
thereby making it possible to Improve the precision of the 
search, For example, the character element * in the 
recognition result, which is incorrect, has a low 
reliability, so that the reference distance thereof is set 
to a high value. Therefore, the character element * P * 
having a distance of 50 from the character element "IS", is 
a subject to be searched for * As a result , when the character 
element string *An$tJS£* is designated, a character element 
string ■ including a misrecognized character can be 

retrieved. 

In this manner, for a character element or a document 
having a low reliability of character recognition, 
distances between character elements (reference distances) 
tolerable in the search are set to large values in a character 
element distance table. Conversely, for a character 
element or a document having a high reliability of character 
recognition, distances between character elements 
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(reference distances) tolerable In the search are set to 
small values In the character element distance table. 
Therefore, it is possible to suppress detection -of extra 
character elements irrelevant to a search keyword. 

5 

It should be noted that a correspondence between a 
value of a reliability and a tolerable distance (reference 
distance) in a character element distance table is 
predetermined. 

10 

Further, when a reliability of character 
y recognition is significantly low, a retrieval process may 

m be switched to another retrieval process in which all 

P character elements can be searched for. 

m A search parameter (reliability) may be attached to 

41 each document or each character element. The reliability 

JL of character recognition may be an output of a character 

S recognition system (e.g., a neural network) or the number 

M 20 of recognition candidates. 

^ In Example 1 , each character element included in the 

character element string designated as a search keyword is 
searched for sequentially from the leading character * 0 * 

25 on a character element -by- character element basis. The 
character elements may be searched for on a character 
element -by- character element basis in a different order. 
Particularly, the frequency of occurrence of each character 
element in a general document Is considered* If the search 

30 is begun from a character element having a small frequency 
of occurrence in a general document out of character elements 
included in a character element string as a search keyword, 
excessive retrieval processes can be reduced, thereby 
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increasing a search speed. 
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It should be noted that in Example 1* document data 
of a recognition result is assumed to be stored in a storage 
5 medium (a memory, a magnetic disk, an optical disk, etc) 
in advance. Alternatively, image data input from an image 
input device (a scanner, a digital camera, a video camera , 
etc.) may be sequentially subjected to character 
recognition, and the resultant real-time information may 
10 be similarly searched. 



In this manner , a character element string is 
searched for using a character element distance table, 
§J thereby making it possible to retrieve a character element 

15 string corresponding to a designated character element 
string from document data even when the designated character 
*[; element string is replaced with a different character 

= element string due to recognition error. 

Zl 20 Further, by using a distance table, the need for 

M complicated distance calculation is removed and a high- 

speed search can be achieved* 



Further, by using a distance taljle, the tolerance 
25 level to recognition error can be set to an appropriate value , 
so that an efficient search is made possible. 



Furthermore , search parameters ( e , g • , a 
reliability of character recognition) is attached to 
30 document data, thereby making it possible to select 
references or switch retrieval methods for the purpose of 
performing a search suited for document data or character 
elements. Therefore, the precision of the search can be 
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improved * 

(Example 2) 

Iti Example 2, a plurality of distance tables era 
5 provided* That is, a plurality of distance a are 
predetermined between character elements. The distances 
are relevant to similarities between character elements. 
One table is selected from the plurality of distance tables ♦ 
The selected table is used to search for a character element 
10 string. Specifically, one of a plurality of distances 
predetermined between character elements is selected. 
' u f ( Based on a comparison between the selected distance and a 

|| predetermined reference distance, whether a match or 

P mismatch occurs between character elements is determined. 

15 One of a plurality of distances predetermined between 
m character elements is selected in accordance with a user's 

input, for example. 



5fj? i: 
•J* 



It should be noted that a basic flow of a process 
20 for searching for a character element string is similar to 
that of Example 1. 

A plurality of distance tables are, for example, 
provided for respective character recognition systems of 
25 a plurality of types. Alternatively, a plurality of 
distance tables are provided, respectively, for a plurality 
of character types (e.g., a Japanese Kanji character, the 
English alphabet, a Greek character, a Japanese Katakana 
character, etc.) or for a plurality of font types. 



30 



For example, Figure 2 shows an exemplary distance 
table prepared for a font type "Gothic font* , while Figure 7 
shows an exemplary distance table prepared for a font type 
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"Mincho font*. 

One of a plurality of distance tables is selectively- 
used depending on the document data to be searched. For 
5 example, an original document is subjected to character 
recognition. In the resultant document data. Information, 
such as character types and font types included in the 
document data and the type of a character recognition system 
used in the character recognition, is held as search 
10 parameters, thereby making It possible to select an 
appropriate table when the search is performed. Therefore, 
the precision and speed of the search can be improved. 

In the case where distance tables are switched 
15 depending on font types, document data obtained by 
subjecting an original document to character recognition 
is preferably provided with information indicating whether 
a font type is close to the Mincho font or the Gothic font, 
the information being attached to each character element 
20 as a search parameter. By referencing such a search 
parameter, distance tables are switched as follows. When 
document data including Gothic font character elements is 
searched for a character element string, a distance table 
as shown in Figure 2 is used. When document data including 
25 Mincho font character elements is searched for a character 
element string, a distance table as shown in Figure 7 is 
used. Information indicating font types can be obtained by 
recognizing font types while recognizing characters. 

30 Further, a plurality of distance tables may be 

switched with respect to the same document data. In this 
case, document data from which a search character element 
string could not have been retrieved is searched again for 
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the search character element using a different criterion. 
As a result, the precision of the search can he improved. 
Further, the image of a document corresponding to the 
location of a character element string detected by a 
5 retrieval process using a distance table may be subjected 
to character recognition with a higher level of precision. 
For example, rough, but high-speed searching is performed 
using a distance table to narrow the number of candidates. 
Thereafter,, a search target can be confirmed using character 
10 recognition with a higher level of precision (a processing 
time is generally long) * Therefore, both search precision 
H and search speed can be improved. 

'-iris* 

In particular, when a character string having a 
0| 15 small number of characters (two-character word) is searched 

Ijf] for, it is highly possible that a similar character string 

Hi is accidentally retrieved. Therefore, in such a situation, 

* a search may be performed again using a different distance 

Cl table depending on the number of character elements in a 

fl 20 character string. Character recognition with a high level 

42 of precision may be also used. Therefore, a high precision 

C| search can be achieved without unnecessarily increasing the 

processing time- 

25 Further, distance tables may be switched depending 

on the character type of a character element string 
designated as a search Keyword, For example, when a 
character element string designated as a search keyword and 
document data only include English alphabet characters, a 

30 distance table for English alphabet characters is used, 
thereby making it possible to remove extra search processes . 



It should be noted that in Example 2 a distance table 
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is used to compensate for character recognition error in 
units of a character. However, a distance table may be used 
to compensate for character recognition error in~units of 
one or more characters* 

5 

Figure 8A shows an example of a recognition error 
in which a plurality of character elements are recognized 
as a single character element ♦ In Figure SA f two characters 
"Tfc" are incorrectly recognized as a single character 
10 and two characters "O" (zero) are incorrectly recognised 
as a single character n °°* (infinity). 



Mi 



2j Figure SB shows an example of a recognition error 

|| in which a single character element is incorrectly 

15 recognized as a plurality of character elements. In 
Figure SB, a single character "JN" is incorrectly recognized 
as three characters * 1 " (one) and a single character 
is incorrectly recognized as m L* and m 1 * (one)* 

20 If in a distance table, a distance between two 

characters m 7fC" and a single character a distance 

between two characters *0* (zero) and a single character 
"oo" (infinity), a. distance between a single character W JIJ" 
and three characters • 1 " (one), and a distance between a 
25 single character "V** and two characters 1 w U " and * 1 ■ (one) 
are set to be small values, a correct search result can be 
obtained in the case of recognition error as shown in 
Figures BA and SB. 



30 



Figure 9 shows an exemplary character element 
distance table including the character elements shown in 
Figures 8A and SB. 
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In the example of Figure 9, a distance between two 
characters and a single character a distance 

between two characters m 0 ■ (zero) and a single character 
n oo" (infinity), a distance between a single character m H\ m 
and three characters * 1 " (one), and a distance between a 
single character and two characters "l/ w and * 1 * (one) 
each are set to 13 or less . These distances are set to values 
considerably smaller than distances from other character 
elements (i.e vf 98 or more). 

For example, a character element string * 1 0 0 * is 
C| designated as a search keyword. A distance (reference 

distance) used as a reference for a tolerance level to 
m recognition error is set to 50, In this case, after 

01 15 retrieving "1*, a single character "CO* as well as two 

H* characters 1 0" can be searched for. Therefore, even if n 1 

0 0 * is incorrectly recognized as "1°°", the character 
element string "10 0" as the search keyword can be 
retrieved. 

20 

Similarly, even if " ^ 5 5 * is incorrectly 
recognized as "tliD", the character element string 
<5 0 * as a search keyword can be retrieved. 

25 Further, when an original document itself includes 

incorrect expressions due to errors such as Kana-Kanji 
conversion (e.g*, is presented instead of *3ft&&&"), 

when there exists a plurality of Japanese Kana characters 
added to a Japanese Kanjl character to show its Japanese 
30 inflection ( m $tt>%>" is presented instead of "^&"), when 
a word expressed in Japanese Kanjl characters is searched 
for using Japanese Hiragana characters is searched 

for using ^'ft'O <£ *} f when a search is performed using a 



5 
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synonym is searched for using or when a 

different language is searched for ("history" is 
searched for using "BI5&*) # a distance between n J&* "and 
a distance between *$tt>* and , a distance between "UP 
5 IB- and *i£-2$" £ a distance between "4fiJfr" and "Jfett", or a 
distance between "history" and "BlSd" is respectively 
sen to a small value. In this case, therefore , a correct 
search result can be obtained* 

10 (Example 3) 

in Example 3, the frequencies of occurrence of 
:j character elements are provided in a table indicating 

I distances between character elements in addition to 

distances between character elements. Therefore, 
oj 15 distances between character elements can be handled as if 

the distances have a probability distribution. 

Figure XOA shows an exemplary distance table 
including the frequency of occurrence. In Figure 10A, for 
20 a character element *T*, the frequency (probability) of 
occurrence of a character element * T* is 0. 2 when a distance 
therebetween is 10, 0.6 when the distance is 20, and 0.2 
when the distance is 30 ♦ 

25 Figure 10B shows another exemplary distance table 

including the frequency of occurrence. In the example of 
Figure 10B, it is assumed that the frequency of occurrence 
complies with a normal distribution. The frequency of 
occurrence is represented by a distance and a variance. For 
30 example, Figure 10B shows that for the character element 
M T", the character element "T" complies with a normal 
distribution around a distance of 20 having a variance of 
10. 
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Figure IOC shows another exemplary distance table 
including the frequency of occurrence. In the example of 
Figure IOC, it is assumed that the frequency of occurrence 
5 complies with an uniform distribution. The frequency of 
occurrence is represented by the shortest distance and, the 
greatest distance. For example, for the character element 
m T*, a character element " F ■ complies with an uniform 
distribution in the distance range of 50 to 70, and a 
10 character element m b" complies with an uniform distribution 
in the distance range of 63 to 122. Therefore, since the 
character element 14 F * and the character element * b " overlap 
in the distance range of 63 to 70 , the frequency of occurrence 
of each character element is determined to be 0.5. 



As described above, the frequency of occurrence is 
included in the distance table as shown In Figures 10A 
through IOC. Therefore, a distance between each character 
element is not a fixed value, but can be handled as if the 
20 distance has a value having a certain range, i.e., a 
probability distribution* Therefore, a search can be 
performed depending on the frequencies of occurrence of the 
character elements. 

25 Description will be given of a retrieval process in 

which a character element string is searched for using a 
distance table including the frequency of occurrence, with 
reference to Figure 11. The basic flow of this retrieval 
process is similar to that of Examples 1 and 2. 



30 



Initially, distances between character elements 
(reference distances) tolerable in a search are determined 
depending on the reliability of character recognition* 
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Thereafter, the frequency of occurrence of a character 
element corresponding to the reference distance is 
calculated based on a distance table. In this - case, a 
relationship among a reliability of character recognition , 
5 a reference distance, and a frequency of occurrence are 
predetermined. 

Figure 11 shows an exemplary relationship among a 
reliability of character recognition , a reference distance, 

10 and a frequency of occurrence . It is assumed that a distance 
(reference distance) tolerable to a character element "A" 
(reliability 0.9) in document data is 10. The rate (the 
frequency of occurrence) that the character element "A* in 
the document data corresponds to the character element "A* 

15 as a search keyword is 0.9, Similarly, it is assumed that 
a distance (reference distance) tolerable to a character 
element "K* (reliability 0-4) in document data is 60. The 
rate (the frequency of occurrence) that the character 
element * E£ " in the document data corresponds to the 

20 character element * Q * as a search keyword is o . 1 * Similarly , 
character elements ■ #1 " and " fi£ w have a frequency of 
occurrence of 0.9. 

In this case, the average of the rates that each 
25 character element included In a character element string 
"APKlJJfc 1 ' designated as a search keyword corresponds to 
character element in the document data is 0.7 {={0*9 + 0.1 
+ 0.9 + 0.9) / 4)* The average rate of a match is set as 
a reference for retrieval to 0.5 or more in advance , for 
30 example* Therefore, the above -described misreoognized 
character element string " A^ISfiSo* can be retrieved with 
respect to the character element string m AD$!fi£" as a search 
keyword. 
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Further, when a search result is displayed, the 
display can be modified depending on a matching rate. For 
example, a location in a document image corresponding to 
5 a search result is highlighted (emphasized by enhanced 
brightness or color, or by a flicker) depending on the level 
of the match rate. This facilitates a searcher to visually 
confirm the match rate* 

10 Further, in the above-described example, the 

average of the match rates of character elements included 
P in a search keyword is used as a reference for retrieval. 

Alternatively, the minimum match rate may be used as the 
reference. A ease where character elements having high 
15 match rates (e.g., 0.8 or more) account for a certain 
proportion or more (e.g., half or more) of the entire 
character element string may be a reference. Further, when 
character elements having high match rates (e.g., 0.8 or 
more) account for a certain proportion or more (e.g., 2/3 
20 or more) of the entire character element string, a reference 
may be modified to be a reduced criterion for the remaining 
character elements having low match rates so that the 
character elements can be easily detected. 

25 For example, as shown in Figure 12, it is assumed 

that document data including a character element string ■ A 
which has been incorrectly recognized for a character 
element string "APlJ^S" is searched for the character 
element string n as a search keyword. In this case, 

30 a character element * [rI * which has been incorrectly 
recognized for a character element *P W has a reliability 
of character recognition of 0.3, a tolerable distance 
(reference distance) of 80, and a match rate of 0.0 with 
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respect to a character element * □ * . In this situation, the 
character element s * IrI " and w □ " do not match at all . However , 
the other character elements "A", and * J^* "included 

in the search keyword have a high match rate (0.9). 
Therefore, for the character element * I§! w , a tolerable 
distance (reference distance) is set to a value more than 
80 (e.g.. 120), 00 that a character element string matching 
the search keyword * A nStfitSc" can be detected (when the 
distance table of Figure 6 is used} . 

Figure 13 shows the steps of a process of producing 
CI a distance table including the frequency of occurrence. 

Figure 13 also shows how to define a distance between a 



IP 



4? 



10 



character element *X" and a character element "Y* and the 
Sjf 15 frequency of occurrence of the distance* A similar process 

is applied to all combinations of character elements, so 
that distances and the frequencies of occurrence are defined 
for all combinations of character elements. 



% 20 The process of Figure 13 is repeated a sufficient 

p number of times in terms of statistics so that the obtained 

^ . frequency of occurrence of a distance D between the 

character element 'X* and the character element "Y" can 
become a statistically probable value. % 

25 

It should be noted that a distance table including 
the frequency of occurrence is produced using a neural 
network NN which has been caused to learn the character in 
advance. The neural network NN is not limited to a specific 
30 type. 



Figure 14 shows the steps of a retrieval process of 
searching a recognition result for a character element 
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string corresponding to a search keyword using a distance 
table including the frequency of occurrence. This 
retrieval process is executed by the CPU 110 in accordance 
with the document searching program. 

5 

By the retrieval process of Figure 14, it is 
determined whether the recognition result includes a 
character element corresponding to a character element 'X* 
included in a character element string as a search keyword, 
10 A similar process is repeated for all character elements 
in the character element string as a search keyword. 



When all character elements in the character element 

m 

m string as a search keyword can he sequentially detected in 

CI! 15 the recognition result and all character elements have a 

Hi probability of more than zero, a string of the frequencies 

*gj of occurrence corresponding to the character element string 

* as a search keyword is obtained. Based on the string of the 

W frequencies of occurrence, it is determined whether the 

p 20 recognition result includes a character element string 

4« corresponding to the character element string as a search 

w keyword. This determination may be performed based on the 

average or minimum of the string of the frequencies of 
occurrence . , 

25 

In . Figure 14, a reliability R indicates a 
reliability of character recognition. The reliability R is 
predetermined for each character element included in the 
recognition result. A relationship between the 
30 reliability R and the reference distance D is also 
predetermined* 



As described above, information on the frequency of 
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occurrence of a character element is Included in a distance 
table in addition to the distances between the character 
elements. Therefore, references or methods for searching 
can be switched depending on the frequency of occurrence 
even if distances are the same, thereby making it possible 
to achieve a higher precision search. 

(Example 4) 

In Example 4, a character element string designated 
as a search keyword is developed into a plurality of 
character element strings in advance using a character 
element distance table. A retrieval process is executed for 
each of the plurality of character element strings. 

Hereinafter, description will be given of a 
retrieval process in which a character element string * EJ 
is designated as a search keyword, and the recognition 
result of Figure 4 is searched for a character element string 
corresponding to the search keyword* This retrieval 
process is executed by the CPU 110 in accordance with the 
document searching program. 

Initially, the character element string "Hi* is 
divided into a character element * B * and $ character element 
* 2fc * * A table indicating distances ; between character 
elements is referenced for each character element, and a 
character element . having a distance from said character 
element less than a predetermined reference distance (e,g, , 
150) is combined with said character element (e.g*, the 
character element "0" and a character element " @ " are 
combined with the character element "H* f and a character 
element * ^ * , a character element * ^ " # and a character 
element m jz m are combined with the character element • 
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As a result , based on the character element string *B&* 
designated as a search keyword, new character element 
strings "B^", "S^fc*, "H;*-, and are 

produced * 

5 

Thereafter, the recognition result (document data) 
of Figure 4 is searched for each of the new character element 
strings. Therefore, the new character element string *H 
Js" can be detected at a location of the original document 
10 where the character element string "0^" exists. As a 
result, a preferable search result can be obtained* 

When no character element string is detected as a 
result of the search, the reference distance may be reset 

15 to a value (e.g w 200) which is more than the predetermined 
reference distance. Therefore t a greater number of 
character element strings are additionally produced and a 
similar retrieval process is executed* This mafces it 
possible to detect recognition error which cannot be 

20 detected when a tolerable distance (reference distance) is 
150, for example , in the distance table. 

As described above, a character element string 
designated as a search keyword is replaced with a plurality 

25 of character element strings having potential recognition 
error using a character element distance table. Bach of the 
plurality of character element strings is searched for* 
Therefore , similar to Example 1, a designated character 
element string can be retrieved from document data in which 

30 the designated character element string is incorrectly 
recognized as another character element string. 



By using a distance table, the need of complicated 
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distance calculation is removed and a high- speed search can 
be achieved. 

Further , by using a distance table, the tolerance 
level to recognition error can be set to an appropriate value, 
so that an efficient search is made possible. 

( Example 5) 

In Example 5, using a character element distance 
table, one or more other character elements are added to 
each character element included in a recognition result 
(document data), and thereafter a search is performed. 

Hereinafter* description will be given of a 
retrieval process in which a character element string " Ef 
is designated as a search Keyword, and the recognition 
result of Figure 4 is searched for a character element string 
corresponding to the search keyword* This retrieval 
process is executed by the CPU 110 in accordance with the 
document searching program. 

Initially, a character element distance table is 
ref erenced, and a character element (e. g* , for the character 
element "B"/ the character element ^H," and a character 
element " M " , and for the character element , a character 
element * l ^ lf and a character element *'$Z*) having a distance 
from each character element included in a recognition result 
(document data) which is less than a reference distance (e.g., 
150) i« added to the recognition result as a candidate of 
the recognition result. 

Figure X5 shows an example in which candidates of 
the recognition result are additionally provided. In the 
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example of Figure 15, the character element " @ " is 
additionally provided as a candidate of the recognition 
result with respect to the character element of the 

recognition result. The character element and the 

character element " m are additionally provided as 
candidates of the recognition result with respect to the 
character element • 

Thereafter, the character element string *Q^ W 
designated as a search keyword is divided into the character 
element * H * and the character element ■ ^ The 
recognition result is searched for the character element 
■0", so that the character element "0" is retrieved from 
the recognition result (document data). Thereafter, it is 
determined whether or not the character element is 
included in the character elements , and "jfe") 

located next to the detected character element 11 B * • Since 
the character element is included in character elements 
("7^*, and "ft"), it is determined that the character 

element string "B^" is retrieved from the recognition 
result • 

Even when a character element string designated as 
a search keyword includes three or more character elements , 
the character element string can be retrieved from a 
recognition result in a similar manner, Specif ically, when 
all of the character elements included in the character 
element string designated as a search keyword are detected 
in sequence in the recognition result, it may be determined 
that the character element is detected in the recognition 
result* 

As described above, one or more character elements 
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are additionally provided with respect to each character 
element included in a recognition result (document data) 
using a table indicating distances between - character 
elements. Therefore, similar to Example 1, even when a 
designated character element string is replaced with 
another character element string due to recognition error, 
a character element string corresponding to a designated 
character element string can be retrieved from document 
data. 

Further, for each character element included in a 
recognition result (document data), one or more character 
elements are additionally provided in advance, thereby 
making it possible to omit the step of referencing a distance 
table upon search. 

It should be noted that the example in which a 
distance table is used in a search as described in Examples 1 
through 3, and the example in which a character element 
designated as a search keyword is developed into a plurality 
of character element strings using a distance table as 
described in Example 4 can be used along with Example 5. 

(Example 6 J 

Figure 16 shows an exemplary 5 original document 
including paragraphs A through D* In the example of 
Figure 16, it is assumed that the proper order of the 
paragraphs is paragraph A, paragraph B, paragraph C, and 
paragraph D» Specifically, it is assumed that the last 
sentence of paragraph A is followed by the leading sentence 
of paragraph B, the last sentence of paragraph B ie followed 
by the leading sentence of paragraph C, and the last sentence 
of paragraph C is followed by the leading sentence of 
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paragraph d> 



When the original document of Figure 16 iff subjected 
to character recognition to obtain document data 
5 (recognition result), the order of the paragraphs in the 
original document is not necessarily recognised properly. 
The layout pattern of paragraphs varies among documents , 
and therefore it is extremely difficult to automatically 
recognize the order of the paragraphs « Therefore, the order 
10 of the paragraphs may be incorrectly recognized. 

Example 6 provides a retrieval method capable of 
retrieving a character element string from a recognition 
result even when the order of the paragraphs is incorrectly 
15 recognized* 



V" Figure 17 shows an exemplary table indicating a 

9 result of recognizing the order of paragraphs . Such a table 

y is produced by the CPU 110 executing the document 

H 20 registering program, and is then stored in the memory 140. 

p The recognized paragraphs are identified by 

r ** respective specific paragraph labels (A, B, C, and D in 

Figure 17 ) . 

25 * 

The table of Figure 17 shows, for each paragraph, 
the paragraph label(s) of paragraph(s) having the 
possibility of being concatenated with the subject 
paragraph and a recognition result thereof* For example, 

30 the first row of the table of Figure 17 indicates that a 
paragraph having the being concatenated with paragraph A 
is either paragraph B or paragraph C and that the last phrase 
of paragraph A is "H/fc^". 
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A paragraph having the possibility of being 
concatenated with a specific paragraph ie determined "by 
referencing a positional relationship between each 
5 paragraph in character recognition of the image data of an 
original document. For example, when original data is in 
the form of vertical writing, a paragraph having the 
possibility of being concatenated with a certain 
paragraph X is determined to be either a paragraph located 
10 below paragraph X (e.g., paragraph Y) or a paragraph 
located to the left of paragraph X (e.g. , paragraph Z) «. in 
this case, paragraphs Y and Z are registered in the 
above -described table as a paragraph having the possibility 
m of being concatenated with paragraph X* 

'92 Alternatively, a paragraph having the leading 

sentence which can be grammatically concatenated with the 

* last sentence of paragraph X may be determined as a paragraph 

having the possibility of being concatenated with 

IJ? 20 paragraph X. 

4* 

H Alternatively, when the paragraphs of an original 

document are laid out under a specific rule, a paragraph 
having the possibility of being concatenated with 
25 paragraph X may be determined based on such a specific rule* 

Hereinafter, description will be given of a 
retrieval process of searching the recognition result for 
a character element string "H*©AP" using the table of 
30 Figure 17 . The retrieval process is executed by the CPU 110 
in accordance with the document searching program* 



Similar to Example 1 through Example 5, each 



- 47 - 



P21S73 



paragraph is searched, for character elements " B " , *^* r , *<D" t 
*A", and included in the character element string * H 

$OAn" in sequence. 

In the example of Figure 17 , the character elements 
m H", , and *<D" are sequentially detected at the end 
o£ paragraph A, Thereafter, the character element n A" is 
searched for. In this case, a paragraph having the 
possibility of being concatenated with paragraph A is 
paragraph B or paragraph C, Therefore, whether the 
character element "A" exists at the leading positions of 
paragraph B and paragraph C is determined. In the example 
of Figure 17, the character element *A" is detected at the 
leading position of paragraph B, Whether the character 
element * □ * exists at a location next to the character 
element "A* is determined. As a result , eventually , all 
of the character elements included in the character element 
string "S^^AP* are retrieved* 

Figure Id shows the steps of a retrieval process of 
searching a recognition result for a character element 
string designated as a search keyword using the table of 
Figure 17 and a character element distance table (e.g, , that 
of Figure 2) while taking into account a concatenation 
relationship between paragraphs . TJhis s-earching process is 
executed by the CPU 110 in accordance with the document 
searching program. 

By the retrieval process of Figure 18 , whether a 
character element correcting to a character element "X* 
included in a character element string as a search keyword 
is included in a recognition result is determined. A 
similar retrieval process is repeated for all character 
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elements Included in the character element string as a search 
keyword* 

It la assumed that the character element *X" does 
5 not match any of character elements at the end portion of 
paragraph A. In this case, it is determined whether the 
character element "X* matches the leading character element 
"Y" of one of paragraphs B and C having the possibility of 
being concatenated with paragraph A* That a paragraph 
10 having the possibility of being concatenated with 
paragraph A is paragraph B or paragraph C is previously 
defined in Figure 17. 

Whether the character element "X* matches the 
15 character element *Y" is determined using the character 
element distance table. This determination is as described 
in Examples 1 through 5. 

As described above, a plurality of paragraphs having 
20 the possibility of being concatenated with a specific 
paragraph are defined in advance* Therefore, even when the 
concatenation relationship between paragraphs is 
incorrectly recognized in character recognition , a 
character element string extending ov?r a plurality of 
25 paragraphs can be appropriately detected. 

Besides from the concatenation between paragraphs, 
a concatenation between lines in a paragraph may be ambiguous 
(e.g., a figure, a table, a caption, or the like exists 
30 between lines) . In this case, each line is similarly given 
a different number, and for a specific line, a plurality 
of lines having the possibility of being concatenated with 
the specific line are defined in advance. Therefore, it is 
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possible to appropriately detect a character element string 
extending over a plurality of lines . 

Further, a concatenation between character 
5 elements may be ambiguous (e.g., a figure, a table, or the 
like Is inserted between character elements, or a character 
element string is decoratively arranged, e,g,, a character 
element string la in the form of a curve). In this case, 
each line (character element) is similarly given a different 
10 number, a plurality of lines (character elements) having 
the possibility of being concatenated with a specific line 
(character element) are defined in advance. Therefore, a 
character element string extending over a plurality of lines 
(character elements) can be appropriately detected. 

15 

As described above/ for a specific character element 
of a plurality of character elements Included, in a 
recognition result, a plurality of character elements 
having the possibility of being concatenated with the 

20 specific character element are defined in advance . In a 
retrieval process, the specific character element is 
concatenated with one of the plurality of character elements 
having the possibility of being concatenated with the 
specific character element. Whether the resultant 

25 character element string matches at least a portion of a 
character element string as a search keyword is determined. 
Therefore, even when a concatenation relationship between 
character elements is incorrectly recognized in character 
recognition, a character element string extending over a 

30 plurality of character elements can be appropriately 
detected. 

The specific character element may be located at the 
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end of a row or column while each of a plurality of character 
elements having the possibility of being concatenated with 
the specific character element may be located air the head 
of a row or column. 

5 

Further, the specific character element and one of 
a plurality of character elements having the possibility 
of being concatenated with the specific character element 
may be located on the same row or column. The specific 
10 character element and another one of the plurality of 
character elements may be located on different columns or 

Q rows and the same column or row. 

■ f% 

It should be noted that in the above examples , a 
Q1 15 plurality of paragraphs (lines or character elements) 

W] having the possibility of being concatenated with a specific 

T* paragraph ( line or character element ) are defined in advance * 

sip 

s Alternatively, a plurality of paragraphs (lines or 

N character elements) having the possibility of preceding a 

20 specific paragraph (line or character element) may be 
defined in advance. In this case, the same effects as 
CI described above can be obtained, 

A paragraph label ( line or character element } having 
25 the possibility of being concatenated with a specific 
paragraph (line or character element) may be represented 
by the absolute value of the paragraph label (line or 
character element) , as described above or alternatively the 
relative value of the paragraph label (line or character 
30 element) , For example, paragraph B and paragraph C having 
the possibility of being concatenated with paragraph A may 
be represented by paragraph +1 and paragraph+2, 
respectively* 
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(Example 7) 

Figure 19 shows an exemplary tabid Indicating a 
recognition result of a concatenation relationship between 
5 paragraphs. Such a table is produced by the CPU 110 
executing the document registering program, and then stored 
in the memory 140. 

The table of Figure 19 Indicates , for each paragraph, 
10 a recognition result and the location of the paragraph* The 
location of a paragraph is , for example , represented by an 
X-Y coordinate system where the upper right corner of an 
original document is an original point. The X and Y axes 
are oriented as shown in Figure 16, for example. For 
15 example, the first line in the table of Figure 19 indicates 
that the end of paragraph A is "Ef>fc£>" and the location of 
paragraph A Is a coordinate point (X. Y) = {10, 100), 

A retrieval process Is substantially similar to that 
20 of Example 6. When the character elements m B m r m & m , and 
*<D" are sequentially detected and thereafter the character 
element * A " is searched for, a paragraph having the 
possibility of being concatenated with paragraph A is 
determined based on the location of paragraph A stored In 
25 the table of Figure 19. In this case, the coordinate point 
of paragraph A is (X, Y) - (10, 100). As paragraphs having 
the possibility of being concatenated with paragraph A, 
paragraph C T(X, Y) « (10, 200)] having a next largest Y 
coordinate to paragraph A and paragraph B [{X, Y) =* {100, 
30 100) ] having a next largest X coordinate to paragraph A are 
determined* 



In this case, since the character element *A* is 
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detected at the leading position of paragraph B, whether 
the character element H □ * exists next to the character 
element "A" is determined. As a result, eventually, all 
of the character elements included in the character element 
string "B^CDXP* are detected. 

It should be noted that when original data is in the 
form of vertical writing, a paragraph having the possibility 
of being concatenated with a certain paragraph X may be 
determined to be either a paragraph located below 
paragraph X (e.g., paragraph Y] or a paragraph located to 
the left of paragraph X (e.g., paragraph Z). 

Alternatively, when the paragraphs of an original 
document are laid out under a specific rule, a paragraph 
having the possibility of being concatenated with 
paragraph X may be determined based on such a specific rule . 

It should be noted that the original point of the 
X-Y coordinate system and the directions of the X coordinate 
axis and the Y coordinate axis are freely selected. Further, 
the order of values which are assigned to respective 
paragraphs or figures may be used as a unit of a coordinate 
value « 

As described above, for each paragraph, information 
indicating the location of the paragraph is held „ Therefore , 
even when the concatenation relationship between paragraphs 
is incorrectly recognized in character recognition, a 
character element string extending over a plurality of 
paragraphs can be appropriately detected. 

Further, by holding information indicating the 
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coordinates of paragraphs, a method for determining a 
paragraph having the possibility of being concatenated with 
the specific data can be changed without a change in the 
document data. The coordinates indicating the locations of 
5 the paragraphs can be used to reproduce the layout Of a 
document . 

It should be noted that in the above-described 
example, the coordinate indicating the location of each 

10 paragraph is held. Alternatively , each line or each 
character element may be given a different number and a 
coordinate indicating the location of each line or each 
character element may be held* In this case, therefore, a 
character element string extending over a plurality of lines 

15 or a plurality of character elements can be searched for. 

(Example fl) 

Similar to Example 6 f description will be given of 
a retrieval process in which document data (recognition 
20 . result) obtained by subjecting the original document shown 
in Figure 16 to character recognition is searched for the 
character element string n 0^COAP w . 

In this case , as shown in Figure 20 , the recognition 
25 result is held in the form of a specific paragraph 
concatenated with one of a plurality of paragraphs having 
the possibility of being concatenated with the specific 
paragraph. Such a recognition result is stored in the 
memory 140, for example. 

30 

In the example of Figure 20 , two recognition results 
{character recognition result l and character recognition 
result 2) are held. The character recognition result 1 is 



- 54 - 



P21873 



pbtained by concatenating paragraph A with paragraph C 
having the possibility of being concatenated with 
paragraph A, The character recognition result 2 i.s 
obtained by concatenating paragraph A with paragraph B 
5 having the possibility of being concatenated, with 
paragraph A. 

When the recognition results of Figure 20 are 
searched for the character element string *0^©AP*, both 
10 the character recognition result 1 and the character 
recognition result 2 are searched for the character element 
string 'BiOAQ*. As a result, the character element 
string "B$0AO" can be retrieved from the character 
recognition result 2., 

15 

It should be noted that when as shown in Figure 20 
a recognition result is held while assuming a plurality of 
concatenation relationships between paragraphs, an upper 
limit may be placed on the number of character elements 

20 Included in a character element string designated as a search 
keyword ( e * g * , ten character elements ) • Only nine 
character elements from the leading positions of 
paragraphs B and C having the possibility of being 
concatenated with paragraph A are concatenated with 

25 paragraph A and the resultant paragraph A is held. In this 
case, a character element string of ten or less character 
elements extending over paragraph A and paragraph B or C 
can be searched for. 

30 As described above , by taking into account a 

plurality of paragraphs having the possibility of being 
concatenated with a specific paragraph, a plurality of 
recognition results are held in advance. Therefore, even 
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when the concatenation relationship between paragraphs is 
incorrectly recognized in character recognition, a 
character element string extending over a plurality of 
paragraphs can toe appropriately detected. Further, by 
5 holding a plurality of recognition results in advance, the 
steps of a retrieval process are simplified, thereby making 
it possible to use a conventional retrieval process, 

(Example 9) 

10 Description will be given of a retrieval process of 

searching document data (recognition result), which has 
p been obtained by subjecting an original document having a 

rfj layout as shown in Figure 21, for a character element string 

m 15 

If! In the example of Figure 21, the sentences in the 
original document are in the form of horizontal writing. 

7 In this case, a distance between character elements is small 

C| in a vertical direction. Therefore, a concatenation 

} u 20 relationship between character elements is likely to be 

\* incorrectly recognized' in character recognition. 

JS&!f 

.M In Example 9, a retrieval method in which a 

character element string can be appropriately retrieved 
25 from a recognition result even whei* a concatenation 
relationship between character elements is incorrectly 
recognized in character recognition is provided. 

Figure 22A shows a table indicating a recognition 
30 result of character recognition under an assumption that 
an original document is in the form of vertical writing. 
The recognition result of Figure 22A is a result of 
recognizing the original document of Figure 21 on a 
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column-by-column basis. 

Figure 22B shows a table indicating a recognition 
result of character recognition under an assumption that 
5 an original document is in the form of horizontal writing. 
The recognition result of Figure 22B is a result of 
recognizing the original document of Figure 21 on a 
row-by-row basis. 

10 The above -described tables are produced by the 

CPU 110 executing the document registering program, and 
then stored in the memory 140. 

|f When the character element string m W F " is 

h-i 15 designated as a search keyword, the recognition result of 

II! Figure 22A is searched for the character element string 

Hi T* " , while at the same time the recognition result of 

. 7 Figure 22B is searched for the character element string 

p 7*". As a result, the character element string *Wf h is 

ft 20 retrieved from the recognition result corresponding to line 

r Z number 3 of the table of Figure 22B. Therefore, it is found 

p that the original document of Figure 21 includes the 

*N character string "WF"* 

25 As described above, recognition results 

corresponding to a plurality of layouts are held. Therefore , 
even when it is difficult to recognize ttie proper layout 
of a document, a character element string can be retrieved 
from a recognition result of such a document. Further, by 

30 holding recognition results corresponding to a plurality 
of layouts, it is possible to use a conventional retrieval 
process . 
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It should be noted that in the above- described 
example, two layouts, i.e. , vertical writing and horizontal 
writing t are assumed. The present invention is not limited 
to these layouts. For example, a layout may be obliquely 
oriented. Such a layout can be processed in a manner similar 
to that for vertical and horizontal writing layouts. 

(Example 10) 

Example 10 provides a retrieval method in which a 
designated character element string can be properly 
retrieved from a recognition result even when the layout 
of an original document is incorrectly recognls&ed- 

Description will be given of a retrieval process of 
searching document data (recognition result), which has 
been obtained by subjecting an original document having a 
layout as shown in Figure 21, for the character element 
string m WP m . 

It is now assumed that the layout of the original 
document is incorrectly recognized, so that the table of 
Figure 22A is held. Different from Example 9, the table of 
Figure 22B is not held. 

initially , the character element string "Wr*" is 
divided into a character element and a character element 
*'F n • Each character element is searched for using the table 
of Figure 22A. As a result, the character element is 
retrieved from the third character of the recognition result 
corresponding to line number 5 of the table of Figure 22A. 
The character element "P* is retrieved from the third 
character of the recognition result corresponding to line 
number 4 of the table of Figure 22B. 
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Whan all of the character elements Included In the 
character element string * ffl F " are detected ±n the 
recognition result, whether the character element string 
"SF" is detected is determined based on a positional 
relationship between each character elements » In this case, 
the character element m ffl m and the character element "F" 
are detected at the same character ordinal position in the 
adjacent rows. Therefore, it is determined that the 
character element string n ^P" is detected. 

It should be noted that whether a character element 
string is detected is determined using a reference different 
from the above-described reference.. Specifically, when a 
positional relationship between each detected character 
element is different from the above- described relationship, 
it is determined that the character element string is 
detected * For example, when coordinates indicating the 
positions of character elements are known, it may be 
determined that the character element string is detected 
if a distance between each character element is less than 
or equal to a predetermined distanae and the character 
elements are arranged linearly* 

Further, another set of steps of a retrieval process 
may be used. For example, all of the character elements may 
not be searched for as described above, only when the 
character element m W is detected, lines adjacent to the 
line in which the character element m ffl m has been detected 
may be searched for the character element U F* + Therefore, 
an unnecessary portion of the retrieval process can be 
obviated, thereby making it possible to perform the 
retrieval process efficiently. 
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As described above, each character element Included 
in a character element string designated as a search keyword 
is detected in a recognition result, and whether the 
character element string is detected is determined based 
on a positional relationship between each character element. 
Therefore, even when the layout of an original document is 
incorrectly recognized, a designated character element 
string can be appropriately detected, 

(Example 11} 

In Example u, a character element string 
designated as a search keyword is divided into two or more 
character element strings, and based on a positional 
relationship between paragraphs in which the divided 
character elements are detected, whether the search keyword 
is detected is determined* 

Hereinafter, a description will be given of a 
retrieval process in which document data (recognition 
result) obtained by subjecting the original document shown 
of Figure 16 to character recognition is searched for the 
character element string "S^OAO", 

Similar to Example 7, a table (Figure 19) 
indicating a recognition result of a concatenation 
relationship between paragraphs is provided in advance. 

Hie table of Figure 19 indicates , for each paragraph, 
a recognition result and the location of the paragraph. The 
location of a paragraph is, for example, represented by an 
X-Y coordinate system where the upper right corner of an 
original document is an original point* The X and Y axes 
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are oriented as shown in Figure 16, for example. 

Initially, the character element string-" B^GD A 
P * designated as a search keyword is divided into two 
5 character element strings. For example, the character 
element string "B^OAP" is divided into a character 
element string m B^ m and a character element string "OA 

10 Thereafter, each paragraph is searched for the two 

divided character element strings . The search is repeated 
for all division patterns of the character element string 
n B^©A0*. For example, in Figure 16, when the character 
element string * B ^ © A P * is divided into a character 

15 element string "B^W" and a character element string m A 
□ the character element string ^B^O" is detected at the 
end of paragraph A and the character element string "AP" 
at the head of paragraph B. When all of the divided 
character element strings are detected, whether the 

20 character element string * 0^OAP 1 is detected is 
determined based on a positional relationship between the 
detected paragraphs. 

For example, when the paragraphs in which the two 
25 character element strings are detected are adjacent to each 
other or close to each other, it is determined that the 
character element string designated as a search keyword is 
detected. In the example of Figure 16, the coordinates (X, 
Y) ■ {10, 100) of paragraph A at which the character element 
30 string M B*^5" is detected and the coordinates (X, Y) ■ (100, 
100) of paragraph B at which the character element string 
" AD " is detected have the same Y coordinate and are adjacent 
to each other. Therefore, it is determined that the 
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character element string "BAGDAD" Is detected in the 
recognition result. 

Further, when a character element string is divided 
into a plurality of character element strings and paragraphs 
are searched for the plurality of character element strings 
as described above, it is preferable that only the end and 
head of each paragraph are searched, thereby making it 
possible to improve the efficiency of a retrieval process* 



It should be noted that in the above -described 
example, a charaater element string designated as a search 
keyword is divided into two character element strings, A 
character element string designated as a search Keyword is 
15 divided into three or more character element strings • In 
this case, a similar retrieval process can be performed. 



In the above -described example, a character element 
r| string extending over a plurality of paragraphs is searched 

III 20 for* Similarly, a character element string extending over 

a plurality of lines can be searched for. In this case, a 
character element string designated as a search keyword is 
divided into a plurality of character element strings, and 
each line is searched for the divided character element 
25 strings. When all of the divided character element strings 
are detected at adjacent locations, it may be determined 
that the search keyword is detected. 

As described above, in Examples 6 through 11, even 
30 when paragraphs or lines are incorrectly (or irregularly) 
concatenated with each other, a character element string 
extending over a plurality of paragraphs can be properly 
detected. 
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Further, when there is character recognition error 
or when paragraphs or lines are incorrectly or irregularly 
concatenated with each other or when vertical or horizontal 
5 writing is incorrectly recognized or irregularly used, a 
designated character element string can be searched for. 

It should be noted that each of Examples 1 through 
11 can be solely performed or a combination of at least two 
10 of Examples 1 through 11 can be performed . 

The retrieval processes of the present invention can 
be typically performed by software on a computer , 
Alternatively, the retrieval processes of the present 
15 invention may be performed by hardware or a combination of 
software and hardware. 

A program (document searching program) 
representing a part or an entirety of the retrieval process 

20 of the present invention is, for example, stored in the 
memory 170. Alternatively , the document searching program 
may be recorded in any type of recording medium, such as, 
a floppy disk, a CD-ROM, and a DVD-ROM* The document 
searching program recorded in such a recording medium is 

25 loaded via a disk drive (not showA) to a memory. 
Alternatively, the document searching program (or a part 
thereof) may be downloaded via a communication network or 
broadcasting to a memory in a computer. The computer serves 
as a retrieval device when a CPU incorporated into a computer 

30 executes the document searching program. 

(Example 12) 

In Example 12, based on the number of character 
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elements Included in a character element string designated 
as a search keyword and the number of character elements 
matching a recognition result out of the character -elements 
included in the character element string designated as a 
5 search keyword, a probability (evaluated value) that a 
search result matches the search keyword is obtained. The 
correctness of the search result is determined based on the 
probability (evaluated value) . 

10 in the description below, a "character element" is 

simply referred to as a 'character". 



Generally , words in various languages have 
redundancy. Even if several characters in a word are not 
15 known, the word can often be identified. Such a tendency 
ill is more significant as the number of characters included 

in a word is increased- In this example, it is shown that 
by use of the tendency of words, a word can be retrieved 
from a character string including an error. 

20 

Hereinafter < referring to Figure 26 , description 
will be given of a retrieval process in which a recognition 
result " • • • ^y&TsV F 2*1^3; • • * " obtained by subjecting 
an original document m • ■ * <£y H^C^^i • * • * to 

25 character recognition is searched for a search word 

i^X7t- H " - In this case , the recognition result is stored 
as document data in the memory 140. The memory 140 may be 
any type of storage medium. 

30 It should be noted that each character in the 

recognition result is given a reliability indicating the 
probability of the recognition result (a probability of a 
correct result) ♦ A probability table is prepared in advance 
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before executing search* 

Figure 27 shows an exemplary probability table. 
The probability table la used to obtain a probability Pa(n, 
5 k) with respect to parameters n and k where n indicates the 
number of characters included in a search word and k 
indicates the number of characters matching a corresponding 
character in searched document data out of the n words . The 
probability Pa(n, k) indicates a probability that a search 
10 result matches the search word* 

The probability table of Figure 27 may be calculated 
using a large number of texts having no error and a word 
dictionary. In a calculation method, the number of 

15 characters matching the characters in a word are examined 
for all sets of n consecutive characters in text data. All 
sets of n consecutive characters are categorized and counted 
by the number of matched characters to obtain the cumulative 
total Nk (i - 1, n). With the cumulative total Nk, a 

20 probability that a set of n consecutive characters having 
k matched characters out of the n characters is Identified 
as a search word can be calculated to be Pa (n, k) {= Nn/Nk) . 

The probability Pa(n, k) varies depending on 
25 different word notation or the positions of k matched 
characters, even if the number of characters n is the same. 
In this example, the probability Pa(n, k) is independent 
of different word notation or the positions of k matched 
characters. In other words, the probability Pa(n, k) is 
30 calculated by accumulating all words having the same number 
of characters n and having k matched characters and using 
the sum (or average) of cumulative totals. 
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It should be noted that the probability Pa(n, k) may 
be calculated for each word notation, each positional 
pattern of matched characters , or each character type of 
words (a Japanese Kanji character, a Japanese Hlragana 
5 character, a Japanese Katakana character, the English 
alphabet , etc . ) . 

It should be noted that when a word has a large number 
of characters , if the number of matched characters K Is equal 

10 to a certain value, the probability may be represented only 
by the number of matched characters k without depending on 
the number of characters n (Pa(h, k) is substantially 
constant even If n varies). In this case, a probability 
Pa(k) depending only on the number of matched characters k 

15 may be used instead of the probability Pa(n, k) * 

In the example shown in Figure 26 , document data to 
be searched matahes the search word *zty&?>7tir—\*" with 
respect to all characters except for a mlsrecognized *9 m 
20 (reliability 0,42), 

The correctness Pw of the checked portion (search 
result) is represented by 

25 Pw = Pa(n, k) * Pb(k) * * * (formula 1), 

where Pa(n, k) Is a probability that the search result is 
identified as a search keyword when the search result matches 
the search keyword in k characters of n characters . In this 
30 case, out of n « 8 characters of the word £rZ*y#— H", 
the search result matches the search keyword in k = 7 
characters. Therefore, according to Figure 27, Pa(8, 7) - 
0,9, 
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Pb(fc) indicates a probability that the k characters 
all are characters to be searched for. A reliability given 
to each character is a probability. Therefore, Pb{k) may 
5 be a product of the reliabilities of characters matching 
a recognition result. In Figure 26, Pb{7) = 0.95 x 0.97 x 
0.99 x 0.98 x 0.99 x 0.97 x 0.96 - 0.85. 

Therefore, the Value Indicating the correctness of 
10 the checked portion (search result) is Pw - Pa(8, 7) x Pb(7) 
= 0.9 x 0.85 m 0*765. The value of Pw is more than a 
f* 5 predetermined threshold { in this example , 0.6). Therefore , 

di the correctness of this checked portion (search result) is 

fill 

j*5 acknowledged. 

m 15 

L|? It , should be noted that in the case of a word having 

~] a large number of characters, the product of reliabilities 

Pb(Jc) tends to be small. Therefore, Pb(k) may be normalized 
Q in a certain method. When the reliability of each character 

20 is not a probability, the reliability may be converted to 
■ji a probability or the simple average of the reliabilities 

may be used instead of Pb(K). 



•33 J 
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Referring to Figure 26 , when there are some 
25 character ( s ) having a small reliability in the seven correct 
characters, if such character (s) having a small reliability 
are not counted, the resultant Pw becomes large. Therefore, 
the number of correct characters k may be selected in a manner 
to cause Pw to be larger. (If seven characters are correct, 
30 PW ■ Pa(8, 7) x Pb(7) « 0.90 X (0.98 X 0.97 X 0.99 X 0.98 
x 0.99 x 0.97 x 0.30) » 0.239. In contrast, if the eighth 
correct character is not included and six characters are 
assumed to be correct, Pw = Pa(8, 6) x Pb(6) ■ 0.85 x (0.98 
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x 0*97 x 0.99 x 0.98 x 0.99 x 0.97) * 0.752. The latter is 
adopted as Pw. ) 

It should be noted that In the case of a database 
5 where a reliability is not given to each character/ Pa(n, 
k) where n is the length of a search word and k is the number 
of k matched characters may be used as a value indicating 
the correctness of a matching portion. 

10 It should be noted that although a non -matched 

character is not used as information In this case, if the 
*** non-matched character has a high reliability, it is 

'4} considered that the character is highly likely to be correct 

(i.e.,, there is a word having only a single different 
gjj 15 character , and the correctness is low for a search result) . 

0! A penalty may be introduced using a reliability of a 

non-matched character (when the reliability of the non- 
matched character is higher than a predetermined threshold 
or when the number of such characters is more than a 
20 predetermined value, even if Pw indicating the correctness 
of a checked portion is more than a threshold, the checked 
portion is not adopted) . 

As described above, even if a search word does not 
25 match a word in a text document with respect to all characters , 
the text document having recognition errors can be searched 
using the redundancy of a word. The problem of determining 
the number of non-matched characters can be solved by 
numerically representing the correctness of a search 
30 portion by above formula (1) using the probability table 
of Figure 27 produced based on a database of a large amount 
of actual data text. 



J? ■ 
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(Example 13} 

In Example 13 , a retrieval process is executed ueing 
the character element distance table described in Example -3 
as well as the probability table described in Example 12. 

5 

In the description below, a, * character element" is 
simply referred to as a "character*. 



m 



Hereinafter, referring to Figure 29, a description 
10 will be given of a retrieval process in which a recognition 
result n ' • ■ ^r/^^7* ta -H^rf£te* • * " obtained by subjecting 
an original document " • • • ^7,7 1*^:^4 • • • " to 
character recognition is searched for a search word 

— t* * . In this case, the recognition result is stored 
IS .as document data in the memory 140. 'The memory 140 may be 
any type of storage medium. 

Each character in the recognition result is given 
a reliability indicating the probability of the recognition 
20 result (a probability of a correct result). Similar to 
Example 12, the probability table of Figure 27 is 
calculated prior to the search. 

In the search, similar to Example 1, a reference 
25 distance to be compared with distances: between character 
elements ie determined based on the reliability of each 
character. Similar to Example 1, the frequency of 
occurrence (probability) of a character element is 
determined based on the reference distance. 



30 



Pw indicating the correctness of a checked portion 
{search result) is represented by 
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Pw - Pa(n r k) ■ • • (formula 2), 

where Pa(n, k) is defined in the probability, jtable of 
Figure 27* 

5 

The number of characters of the recognition result 
itself matching the characters of a search word is six (i.e 
first, second, fifth, sixth, seventh, and eighth 
characters ) . By referencing the character element distance 

10 table, the third character matches that of the search word* 
Eventually, the recognition result matches the search word 
with respect to seven characters (a character of the 

recognition result having a reliability of 0.42 has a 
probability of 0 . 3 (>0 } of being " # " ) ♦ Therefore , Pw = Pa ( 8 , 

15 7 ) = 0 . 9 . If it is now assumed that a predetermined threshold 
is 0»80, the value of Pw is more than the threshold* 
Therefore, the correctness of the search result may be 
acknowledged . 

20 However, "search noise" that a character string 

other than a search character string is searched for may 
be desirably reduced, as much as possible. In this case, in 
order to make a more detailed determination of the 
correctness, the reference value of distance (reference 

25 distance) is Increased for the non -matching fourth 
character m ^ m (i.e. , a reliability of recognition is reset 
to a smaller value and the reference value of distance 
(reference distance) is obtained) . Therefore, recognition 
error which cannot be detected due to the asero frequency 

30 of occurrence when the reference value of distance 
(reference distance) Is 20 can be detected. 



When a non-matched character is handled as a 
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wildcard which indicates any potential character, a word 
which accidentally has a single different character is 
highly likely to be retrieved. However, if the ^reference 
value of distance (reference distance) is reset to a slightly 
larger value, it is possible to search only for a similar 
character in the character element distance table which 
tends to be incorrectly recognized under an assumption of 
recognition error. As a result, search noise can be 
reduced , 

Further, if a reliability of recognition is high due 
to recognition error, the frequency of occurrence is zero 
(the reference value of distance (reference distance) is 
a slightly too small due to a high reliability. However, 
if the reference value of distance (reference distance) is 
slightly larger, the f retjuency of occurrence is greater than 
zero), Such a situation can be avoided when the reference 
value of distance (reference distance) is reset so as to 
be a larger value. 

It should be noted that the range of an increase in 
the reference value of distance (reference distance) f i.e. , 
the range of a decrease in a reliability of recognition, 
may be controlled depending on the value of Pw indicating 
the correctness of a search result. Further, the number of 
characters whose reference value of distance is increased 
(a reliability of recognition is decreased) may be 
controlled* 

It should be noted that whether the reference value 
of distance (reference distance) is increased or a wildcard 
is used may be controlled depending on the value of Pw 
indicating the correctness of a search result* 
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Figure 30 (Shows the steps of a fuszy retrieval 
process according to Example 13. This fuzzy ^retrieval 
process is executed by the CPU 110 in accordance with the 
5 document searching program* 

Initially , a reference distance (initial value) to 
be compared with distances between character elements is 
predetermined (step S3001) . This reference distance may be 
10 predetermined for each character included in a search word, 
or shared by the characters included in the search word* 



m 



Matching is performed on a word-by-word basis (step 
S3002}* Based on a result of the matching, a character 
15 string in a recognition result corresponding to a search 
word is evaluated (step S3003)* For example, as the 
evaluated value, Pw indicated by formula (1> or (2) can be 
s used. 
0 

20 Thereafter, whether all characters included in the 

r ?l search word matches the recognition result is determined 

(step S3004) 



When a result of the determination in step S3 004 is 
25 ■Yes" , the evaluated value is compared with a predetermined 
threshold 1 (step S3005). 



When the evaluated value is more than the 
predetermined threshold 1, it is determined that the search 
30 word is detected in the recognition result (step S3006). 



When the evaluated value is less than or equal to 
the predetermined threshold l r the search result is 
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rejected (step S3007}, The reason the search result is 
rejected is to suppress detection error which is likely to 
occur when many characters are given a low reliability in 
the recognition result, 

5 

When a result of the determination in step S3004 is 
"No", the evaluated value is compared with a predetermined 
threshold 2 (step S3008). 

10 When the evaluated value is more than the 

predetermined threshold 2, the reference distance is 
changed (step S3009). A character, which has not matched 
the recognition result, of the search word is subjected again 
to the word-by-word matching (step S3002). In step S3009, 

15 the reference distance is changed only for a character, which 
has not matched the recognition result, of the search word. 
It should be noted that the reference distance may be changed 
for all characters of the search word. Further, the 
reference distance may be changed to a constant value more 

20 than the initial value or a value varying depending on the 
evaluated value (a variable value more than the initial 
value ) . 

When the evaluated value is less than or equal to 
25 the predetermined threshold 2, it is determined that the 
search word was not retrieved from the recognition result 
(step S3010). 

It should be noted that a predetermined upper 
30 limit n is placed on the number of returns from step S3009 
to step S3002. In this example, n = 2. The reason the upper 
limit is provided is that once the evaluated value exceeds 
the predetermined threshold 2, the reference value is 
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changed In step S3009 and the process returns to step 53002 
until all characters included in the search word match the 
recognition result. In addition, when all characters 
included in the search word do not match the recognition 
5 result no matter how much the reference distance is increased , 
the prooess may fall into an endless loop. The provision 
of the upper limit n can prevent such an endless loop* 

It should be noted that it is preferable to 
10 distinguish a way to display a probable search result from 
a way to display a non- probable result so as to clarify the 
probability of the search result. For example, a display 
method for the search result may be modified depending on 
the evaluated value. 

IS 

fjf It should be noted that the retrieval process 

described in Example 12 is the same as that described in 
Example 13 except for step S3001 of Figure 30 and the case 
where a result of the determination in step S3008 is "Yes" . 
20 In Example 12, when a result of the determination in step 
S3 00 8 is "Yes", it is determined that the search word is 
detected in the recognition result. The retrieval process 
is then ended. 

25 (Example 14} 

Example 14 is a retrieval process obtained by 
modifying those described in Examples 12 and 13. 



is*? 



In the description below, a "character element* is 
30 simply referred to as a "character". 

Hereinafter, referring to Figure 31, description 
Will be given of a retrieval process in which a recognition 
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result ■ '^y&Xyx^FO^feM* * •* obtained by subjecting 
an original document * • • • ^X7*- }*<D¥±M * * * * to 
character recognition is searched for a search -word 
?X7* — Fzfe^". In this case, the recognition result is 
5 stared as document data in the memory 140 . The memory 140 
may be any type of storage medium. 

Each character in the recognition result is given 
a reliability indicating the probability of the recognition 
10 result (a probability of a correct result). Similar to 
Examples 12 and 13, the probability table of Figure 27 is 
calculated prior to search. 

In the search, initially, whether the search word 
15 can be divided into aplurality of words 

is determined* In this determination, a previously 
prepared word dictionary is used, for example* In this 
example, it is assumed that a word u ?fry — and a 
word "^C^* are present in the previously prepared word 
20 dictionary. In this case, the search word #X7sr — 
Fjfc^** is divided into two words, i.e., m 3fr*/ £?7>7# — T 4 " and 

In search, document data is searched for a portion 
25 in which the search word *^C^ n follows the search word "5f" 
y?X7* w H". A search for each search word is performed 
in a similar manner to that described in Examples 12 and 
13. 

30 If the search is performed without dividing the 

search word into two words, matches occur in eight characters 
in the word &?s Y>i^* of ten characters at first, 

second, fourth, fifth, sixth, seventh, eighth, and tenth 
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characters. Further, since Pa (10, 8) is typically a large 
value, detection error easily occurs, resulting in search 
noise* The search easily succeeds around other .portions 
where the word exists, resulting in 

5 frequent search noise. Therefore, when a long search word 
is made of a plurality of words , the search word is divided 
into a plurality of words using a word dictionary or the 
like, thereby making it possible to reduce search noise. 

10 It should be noted that a word dictionary preferably 

includes not only ordinary words but also character strings 
which tend to be shared by a plurality of words • For example , 
a word dictionary includes a character string m ^-~i/S 
included in n 7?>F— is3>", 'tyx^^n^, ^7,^—^ 

15 3>", and the like. In search, the character strings are 
divided into m 79> 9 and *5^^3 **UX>" and 

> / 3>",or*X" and v'a >* , respectively* By searching 
for the divided character strings, it is possible to prevent 
detection error that a different word sharing a part with 

20 a search word is detected* 

INDUSTRIAL APPLICABILITY 

According to the present invention, a distance which 
25 is relevant to a similarity between character elements is 
predetermined between character elements. Whether a 
character element included in a recognition result matches 
a character element included in a search keyword is 
determined based on a result of comparison of the distance 
30 between the character elements and a predetermined 
reference distance . By varying the predetermined reference 
distance depending on a reliability of a recognition result, 
it is possible to perform a search while dynamically changing 
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a tolerance level to recognition error depending on the 
recognition result. 

Further, distances between character elements are 
previously provided in the form of a table. Therefore, 
complicated calculation of distances is not required in the 
search. As a result, a high-speed search can be achieved. 

Furthermore, for a specific character element, a 
plurality of character elements having the possibility of 
being concatenated with the specific character element are 
determined. Therefore, even when the layout of an original 
document ie incorrectly recognized, a search Keyword can 
be appropriately retrieved from a recognition result. As 
a result, even when it is incorrectly recognized whether 
the sentences of an original document in the form of vertical 
writing or horizontal writing, or a subsequent line to be 
concatenated after line feed is incorrectly recognized, a 
search keyword can be appropriately retrieved from a 
recognition result. 
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CLAIMS 



l. A retrieval method for searching a first character 
element string obtained by subjecting a character string 
5 to character recognition for a second character element 
string, 

Wherein the first character element string includes 
a first character element and the second character element 
string includes a second character element, and 
10 a distance relevant to a similarity between the 

first character element and the second character element 
is predetermined between the first character element and 
the second character element, 

the retrieval method comprising the steps oft 
15 comparing the distance with a first predetermined 

reference distance; and 

determining whether the second character element 
matches the first character element based on a result of 
CI the comparison of the distance with the first predetermined 

20 reference distance, 

□ 2. A retrieval method according to claim 1, wherein for the 

^ first character element, a reliability of character 

recognition is predetermined, and 
25 the first predetermined reference distance is 

determined based on the reliability. 

3, A retrieval method according to claim 1, wherein the 
predetermined first reference distance is determined based 
30 on user input. 



4* A retrieval method according to claim 1, further 
comprising the steps of: 



$4 mm 
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changing the first predetermined reference 
distance to a second reference dlstanae; 

comparing the distance with the second .reference 
distance? and 

5 determining whether the second character element 

matches the first character element based on a result of 
the comparison of the distance with the second reference 
distance. 

10 5. A retrieval method according to claim 1, wherein a 
plurality of distances relevant to the similarity between 
the first character element and the second character element 
are predetermined between the first character element and 
the second character element, and 

15 one distance selected from the plurality of 

distances is used as the distance. 

6 ♦ A retrieval method according to claim 5 , wherein the one 
of the plurality of distances is determined based on user 
20 input. 

7. A retrieval method according to claim 1, wherein the 
distance has a probabilistic distribution. 

25 8. (Amended) A retrieval method for . searching a first 
character element string obtained by subjecting a character 
string to character recognition for a second character 
element string, 

wherein the first character element string includes 
30 a plurality of character elements, 

for a specific character element of the plurality 
of character elements, character elements at a plurality 
of locations having the possibility of being concatenated 
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with the specif ic character element are predetermined, 
the retrieval method comprising the steps of: 
determining whether a character element string 
obtained by concatenating the specific character element 
of the character elements at a plurality of locations with 
one character element of the plurality of character elements , 
the one character element being different from the specific 
character element , matches at least a part of the second 
oharaoter element string* 



9, A retrieval method according to claim 8, comprising the 
steps oft 

selecting one character element from the plurality 
of character elements having the possibility of being 
15 concatenated with the specific character element; and 

determining whether a character element string 
obtained by concatenating the specific character element 
with the selected character element matches at least a part 
of the seaond character element string. 



10. A retrieval method according to claim 8, wherein the 
specific character element is located at an end of a row 
or column, the plurality of character elements having the 
possibility of being concatenated with the specific 
25 character element are each located at; a head of a row or 
column. 



11 • A retrieval method according to claim 8, wherein the 
specific character element and one of the plurality of 
30 character elements having the possibility of being 
concatenated with the specific character element are 
located at the same row or column, and 

the specific character element and another one of 
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the plurality of character elements having the possibility 
of being concatenated with the specific character element 
are located at different rows or columns and at -the same 
column or row* 

12. A retrieval method for searching a first character 
element string obtained by subjecting a character string 
to character recognition for a second character element 
string, 

wherein the first character element string includes 
at least one first character element and the second character 
element string Includes at least one second character 
element , 

the retrieval method comprising the steps of t 
obtaining a probability that a search result matches 
the second character element string, based on the number 
of the second character elements included in the second 
character element string, and a number of the second 
character elements matching the corresponding first 
character elements out of the second character elements 
included in the second character element string; and 

determining the correctness of the search result 
based on the probability. 

13* A retrieval method according to claim 12, wherein a 
distance relevant to a similarity between the first 
character element and the second character element is 
predetermined between the second character element and the 
corresponding first character element, and 

the retrieval method further comprising the steps 

of: 

comparing the distance with a predetermined 
reference distance; and 
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determining whether the second character- element 
matches the corresponding first character element based on 
a result of the comparison of the distance - with the 
predetermined reference distance. 

14. A retrieval method according to claim 13, further 
comprising the step o£i 

for a second character element out of the at least 
one second character element included in the second 
character element string, said second character element not 
matching a corresponding first character element included 
in the first character element string, after resetting a 
predetermined reference distance , determining whether said 
second character element matches the corresponding first 
character element using the reset predetermined reference 
distance. 

15. A retrieval method according to claim 12, further 
comprising the step of: 

dividing the second character element string into 
a plurality of character element portions* 

16* A retrieval device for searching a first character 
element string obtained by subjecting 4 character string 
to character recognition for a second = character element 
string, 

wherein the first character element string includes 
a first character element and the second character element 
string includes a second character element, and 

a distance relevant to a similarity between the 
first character element and the second character element 
is predetermined between the first character element and 
the second character element, 
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the retrieval device comprising i 

means for comparing the distance with a 
predetermined reference distance? and 

means for determining whether the second character 
element matches the first character element leased on a result 
of the comparison of the distance with the predetermined 
reference distance. 

17. (Amended) A retrieval device for searching a first 
character element string obtained by subjecting a character 
string to character recognition for a second character 
element string, 

wherein the first character element string includes 
a plurality of character elements, and 

for a specific character element of the plurality 
of character elements, character elements at a plurality 
of locations having the possibility of being concatenated 
with the specific character element are predetermined, 

the retrieval device comprising: 

means for determining whether a character element 
string obtained by concatenating the speaif ie character 
element of the character elements at a plurality of locations 
with one character element of the plurallty^of character 
elements, the one character element being different from 
the specific character element , matches at least a part of 
the second character element string. 

18. A retrieval device for searching a first character 
element string obtained by subjecting a character string 
to character recognition for a second character element 
string, 

wherein the first character element string includes 
at least one first character element and the second character 
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element string includes at least one second character 
element , 

the retrieval device comprising s 
means for obtaining a probability that a search 
5 result matches the second character element string, based 
on the number o£ the second character elements included in 
the second character element string, and a number of the 
second character elements matching the corresponding first 
character elements out of the second character elements 
10 included in the second character element string; and 

means for determining the correctness of the search 
_ result based on the probability. 

hit 

Of 19. A computer readable recording medium in which a program 

15 for causing a computer to execute a retrieval process for 
p| searching a first character element string obtained by 

111 subjecting a character string to character recognition for 

a second character element string is recorded, and 
h wherein the first character element string includes 

i|J 20 a first character element and the second character element 

string includes a eecbnd character element, 
% a distance relevant to a similarity between the 

M first character element and the second character element 

is predetermined between the first character element and 
25 the second character element, ; 

the retrieval process comprising the steps of: 
comparing the distance with a predetermined 
reference distance; and 

determining whether the second character element 
30 matches the first character element based on a result of 
the comparison of the distance with the predetermined 
reference distance. 
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20 . (Amended) A computer readable recording medium in which 
a program for causing a computer to execute a retrieval 
process for searching a first character element string 
obtained by subjecting a character string to character 
recognition for a second character element string is 
recorded, 

wherein the first character element string includes 
a plurality of character elements, and 

for a specific character element of the plurality 
of character elements, character elements at a plurality 
of locations having the possibility of being concatenated 
with the specific character element are predetermined , 

the retrieval process comprising the steps of: 

determining whether a character element string 
obtained by concatenating the specific character element 
of the character elements at a plurality of locations with 
one character element of the plurality of character elements, 
the one character element being different from the specific 
character element, matches at least a part of the second 
character element string. 

21. A computer readable recording medium in which a program 
for causing a computer to execute a retriev«=k^process for_ 
searching a first character element string obtained by 
subjecting a character string to character recognition for 
a second character element string is recorded, 

wherein the first character element string includes 
at least one first character element and the second character 
element string includes at least one second character 
element, 

the retrieval process comprising the steps of: 
obtaining a probability that a search result matches 
the second character element string , based on the number 
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of the second character elements included In the second 
character element string, and a number of the second 
aharaoter elements matching the corresponding first 
character elements out of the second character elements 
included in the second character element stringy and 

determining the correctness o£ the search result 
based on the probability. 
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